Introduction: This analysis is based on the outputs of pairwise comparisons of differential gene expression generated by this template. It uses results from 3 pairwise comparisons of 3 sample groups vs. their corresponding control groups and compares how these 3 sample groups are different from each other in terms of their sample-control differences (delta-delta). An example of such analysis is the different responses of 3 cell types to the treatment of the same drug. This analysis is focused on the overlapping of differentially expression at both gene and gene set levels.

 

Go to project home

1 Description

1.1 Project

Transcriptome in immune cells of control-patient samples

1.2 Data

Rna-seq data was generated from of 3 types of immune cells of 3 controls and 3 patients. Raw data was processed to get gene-level read counts. Pairwise comparisons were performed between controls and patients in each immune cell.

1.3 Analysis

This is a demo.

1.4 Pairwise comparisons

This report compares the results of the following pairwise comparisons.

Table 1. Information about all 3 individual comparisons, including comparison name, group names, group size, total number of genes, etc.
Name Group1 Group2 Num1 Num2 Num_Gene Test Paired DEG_Higher DEG_Lower
B_Cell Control SLE 3 3 23272 EdgeR FALSE 984 1853
T_Cell Control SLE 3 3 23272 EdgeR FALSE 693 671
Monocyte Control SLE 3 3 23272 EdgeR FALSE 501 465

2 Gene-level comparison

2.1 Global delta-delta correlation

Both comparisons reported the log ratio of sample and control group means for each gene. The global agreement of log ratios of all genes indicates how much the results of these 3 comparisons are similar to or different from each other. Full table of gene-level statistics side-by-side is here.

  • Corr(B_Cell:T_Cell) = 0.244
  • Corr(B_Cell:Monocyte) = 0.208
  • Corr(T_Cell:Monocyte) = 0.26
plot of chunk log_ratio
Figure 1. This plot shows the global correlation (correlation coefficient = 0.244, 0.208, 0.26) between the 3 pairwise comparisons: B_Cell, T_Cell, and Monocyte. The same 3D plot was showed in 2 different angles. Genes obtained p values less than 0.01 from any 1, any 2, or all 3 comparisons were highlighted in yellow, orange, or red respectively. The correlatio coefficients between log-ratios of each pair of comparisons are:

2.2 Differentially expression genes (DEGs)

Both comparisons identified DEGs from 2 compared groups. Check report of individual comparisons for how the DEGs were selected. Overlapped DEGs identified by all 3 comparisons are worthy of a closer look.

Table 2. Number of DEGs identified by each comparison.
Total_gene P < 0.01 DEG, > control DEG, < control
B_Cell 23272 3235 984 1853
T_Cell 23272 2130 693 671
Monocyte 23272 1693 501 465
plot of chunk deg_overlap_up

Figure 2A. Overlapping of DEGs, higher expression comparing to control groups. Click links to view overlapping genes:

plot of chunk deg_overlap_dn

Figure 2B. Overlapping of DEGs, lower expression comparing to control groups. Click links to view overlapping genes:

2.3 ANOVA

2-way ANOVA analysis was performed to identify genes responding to SLE differently in different Cell. The analysis reported 3 p values, corresponding to the effect of SLE, Cell, and their interaction. The analysis identified 1513 significant genes with interaction p values less than 0.01. The full ANOVA results were summarized in a table here.

plot of chunk aov_top
Figure 3. Examples: the top 4 genes having the most significant interactive p value, among the genes with significant differential expression in at least one of the three pairwise comparisons.

3 Gene set-level comparison

Genes are often grouped into pre-defined gene sets according to their function, interaction, location, etc. Analysis then can be performed on genes in the same gene set as a unit instead of individual genes.

3.1 Gene set average

Average differential expression of genes in the same gene set. The gene set-level mean of log-ratio were summarized in this table here.

plot of chunk geneset_average_plot

Figure 4. Each dot represents a gene set and the average log-ratio of all genes in this gene set. The same 3D plot was showed in 2 different angles.

  • Corr(B_Cell:T_Cell) = 0.467
  • Corr(B_Cell:Monocyte) = 0.313
  • Corr(T_Cell:Monocyte) = 0.388

3.2 Gene set over-representation analysis (ORA)

Each 2-group comparison performs gene set over-representation analysis (ORA) that identifies gene sets over-represented with differentially expressed genes. The results of ORA of both 2-group comparisons are summarized and compared here. The ORA of each gene set reports an odds ratio and p value. These statistics from both comparisons were combined and listed side-by-side, as well as the difference of their odds ratios and ratio of their p values (p set to 0.5 when not available), in this table here

Table 2. Gene sets were broken down into subgroups by their sources. Click on the numbers of over-represented gene sets to see a full list.
B_Cell::Higher_in_Control B_Cell::Higher_in_SLE T_Cell::Higher_in_Control T_Cell::Higher_in_SLE Monocyte::Higher_in_Control Monocyte::Higher_in_SLE
BioSystems 438 3212 921 479 684 1095
KEGG 40 319 49 119 48 164
MSigDb 857 4125 1565 686 1024 2024
OMIM 0 1 0 0 0 0
PubTator 123 7634 632 892 1726 1953
plot of chunk ora_overlap_up

Figure 5A. The overlapping of over-represented gene sets by up-regulated genes in all 3 comparisons. Click links to view overlapping gene sets:

plot of chunk ora_overlap_dn

Figure 5B. The overlapping of over-represented gene sets by down-regulated genes in all 3 comparisons. Click links to view overlapping gene sets:

3.3 Gene set enrichment analysis (GSEA)

Each 2-group comparison performs gene set enrichment analysis (GSEA) on genes ranked by their differential expression. The results of GSEA of both 2-group comparisons are summarized and compared here. The GSEA of each gene set reports an enrichment score and p value. These statistics from both comparisons were combined and listed side-by-side in this table here

Table 3. Gene sets were broken down into subgroups by collections. Click on the numbers of enriched gene sets to see a full list.
B_Cell::Higher_in_Control B_Cell::Higher_in_SLE T_Cell::Higher_in_Control T_Cell::Higher_in_SLE Monocyte::Higher_in_Control Monocyte::Higher_in_SLE
C0_Hallmark 2 37 12 3 4 15
C1_Positional 13 26 28 10 19 19
C2_BioCarta_Pathways 1 68 14 2 0 20
C2_Chemical_and_genetic_perturbations 36 1356 411 122 107 353
C3_MicroRNA_targets 0 51 3 5 4 4
C3_TF_targets 4 284 12 89 10 85
C4_Cancer_gene_neighborhoods 42 86 170 17 34 57
C4_Cancer_modules 10 176 78 18 28 65
C6_Oncogenic_signatures 2 116 9 17 20 20
C7_Immunologic_signatures 58 922 432 52 44 381
GO_BP 145 2065 437 239 321 498
GO_CC 67 159 120 34 37 50
GO_MF 44 359 87 71 73 121
KEGG_compound 4 126 41 47 44 84
KEGG_enzyme 1 1 2 3 2 1
KEGG_module 11 13 24 3 5 10
KEGG_pathway 9 161 27 25 7 57
KEGG_reaction 2 35 24 23 23 27
OMIM_gene 1 2 2 2 0 1
REACTOME 92 283 230 63 46 134
WikiPathways 2 91 13 5 9 21
plot of chunk nes

Figure 6. This plot shows the global correlation (correlation coefficient = 0.361, 0.2, 0.189) of nominal enrichment scores between the 3 pairwise comparisons: B_Cell, T_Cell, and Monocyte. The same 3D plot was showed in 2 different angles. Gene sets obtained p values less than 0.01 from any 1, any 2, or all 3 comparisons were highlighted in yellow, orange, or red respectively. The correlatio coefficients between enrichment scores of each pair of comparisons are:

  • Corr(B_Cell:T_Cell) = 0.361
  • Corr(B_Cell:Monocyte) = 0.2
  • Corr(T_Cell:Monocyte) = 0.189
plot of chunk gsea_overlap_up

Figure 7A. The overlapping of enriched gene sets by up-regulated genes in all 3 comparisons. Click links to view overlapping gene sets:

plot of chunk gsea_overlap_dn

Figure 7B. The overlapping of enriched gene sets by down-regulated genes in all 3 comparisons. Click links to view overlapping gene sets:

3.4 Gene clustering

The top 1200 genes with significant ANOVA p values (p <= ‘r prms\(geneset\)cluster$panova’) were used as seeds to perform a gene-gene clustering analysis and 12 clusters were identified. ORA was performed on the clusters to identify their functional association (see table below);

Table 4. This table lists the number of genes in each cluster (click the numbers to see gene lists), the average expression of all genes in a cluster of all sample groups (normalized so the mean of the control groups equals to 0 and the mean is the number of standard deviations), and then the gene sets over-represented in each cluster (click the numbers to see gene set lists).
ID Size B_Cell::Control B_Cell::SLE T_Cell::Control T_Cell::SLE Monocyte::Control Monocyte::SLE Gene_set
Cluster_1 108 0 1.5435 0 -1.3742 0 1.1953 609
Cluster_2 98 0 1.3704 0 -1.4543 0 -1.4314 2164
Cluster_3 480 0 1.6840 0 1.5731 0 1.6065 1439
Cluster_4 123 0 1.6020 0 1.4107 0 -1.2534 1069
Cluster_5 303 0 -1.6430 0 -1.6402 0 -1.5918 2441
Cluster_6 18 0 0.3060 0 1.5256 0 -1.3006 639
Cluster_7 122 0 1.5202 0 1.6649 0 0.6400 1108
Cluster_8 65 0 -1.3774 0 1.3512 0 -1.4642 1862
Cluster_9 22 0 -1.4537 0 1.5435 0 0.1158 938
Cluster_10 63 0 -1.5032 0 -1.3938 0 1.4290 1424
Cluster_11 58 0 -1.3088 0 1.0172 0 1.5528 2119
Cluster_12 12 0 0.0941 0 -1.4782 0 1.4278 308
plot of chunk clustering_heatmap
Figure 8. This plot shows below the average expression levels of each cluster. Data was normalized before the analysis, so the mean of the control groups was zero and the standard deviation of all samples of each gene was 1.0. Values indicate number of standard deviation from mean of relative control group.
plot of chunk clustering_mean
Figure 9. This plot summarizes the group means and standard errors of all clusters.

4 Appendix

Check out the RoCA home page for more information.

4.1 Reproduce this report

To reproduce this report:

  1. Find the data analysis template you want to use and an example of its pairing YAML file here and download the YAML example to your working directory

  2. To generate a new report using your own input data and parameter, edit the following items in the YAML file:

    • output : where you want to put the output files
    • home : the URL if you have a home page for your project
    • analyst : your name
    • description : background information about your project, analysis, etc.
    • input : where are your input data, read instruction for preparing them
    • parameter : parameters for this analysis; read instruction about how to prepare input data
  3. Run the code below within R Console or RStudio, preferablly with a new R session:

if (!require(devtools)) { install.packages('devtools'); require(devtools); }
if (!require(RCurl)) { install.packages('RCurl'); require(RCurl); }
if (!require(RoCA)) { install_github('zhezhangsh/RoCAR'); require(RoCA); }

CreateReport(filename.yaml);  # filename.yaml is the YAML file you just downloaded and edited for your analysis

If there is no complaint, go to the output folder and open the index.html file to view report.

4.2 Session information

## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DEGandMore_0.0.0.9000 snow_0.4-1            rchive_0.0.0.9000    
##  [4] VennDiagram_1.6.17    futile.logger_1.4.1   scatterplot3d_0.3-37 
##  [7] gplots_3.0.1          MASS_7.3-45           htmlwidgets_0.6      
## [10] DT_0.1                awsomics_0.0.0.9000   yaml_2.1.13          
## [13] rmarkdown_0.9.6       knitr_1.13            RoCA_0.0.0.9000      
## [16] RCurl_1.95-4.8        bitops_1.0-6          devtools_1.12.0      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5          magrittr_1.5         highr_0.6           
##  [4] stringr_1.0.0        caTools_1.17.1       tools_3.2.2         
##  [7] parallel_3.2.2       KernSmooth_2.23-15   lambda.r_1.1.7      
## [10] withr_1.0.2          htmltools_0.3.5      gtools_3.5.0        
## [13] digest_0.6.9         formatR_1.4          futile.options_1.0.0
## [16] memoise_1.0.0        evaluate_0.9         gdata_2.17.0        
## [19] stringi_1.1.1        jsonlite_0.9.22

END OF DOCUMENT